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This  paper  describes  a method  of  image  seg- 
mentation that  creates  a partition  of  the  image 
into  compact,  homogeneous  regions  using  a parallel, 
iterative  approach  that  does  not  require  immediate 
forced  choices.  The  approach  makes  use  of  a 
"pyramid"  of  successively  reduced-resolution  ’ er- 
sions  of  the  image.  It  defines  link  strengths 
between  pairs  of  pixels  at  successive  levels  of 
this  pyramid,  based  on  proximity  and  similarity, 
and  iteratively  recomputes  the  pixel  values  and 
adjusts  the  link  strengths.  After  a few  itera- 
tions, the  link  strengths  stabilize,  and  the  links 
that  remain  strong  define  a set  of  subtrees  of  the 
pyramid.  Each  such  tree  represents  a compact 
(piece  of  a)  homogeneous  region  in  the  image;  the 
leaves  of  the  subtree  are  the  pixels  in  the  region, 
and  the  size  of  the  region  depends  on  how  high  the 
root  of  the  tree  lies  in  the  pyramid.  Thus  the 
trees  define  a partition  of  the  image  into  (pieces 
of)  homogeneous  regions,^. 

1.  Introduction 

Most  of  the  existing  methods  of  image  segmen- 
tation [1,2]  are  based  on  forced-choice  decisions. 
In  methods  that  classify  pixels  into  subpopula- 
tions, we  must  decide  to  which  class  each  pixel 
belongs.  In  methods  that  partition  the  Image  into 
homogeneous  regions  using  splitting  and  merging 
processes,  we  must  decide,  for  each  current  region, 
whether  to  split  it,  or  whether  to  merge  it  with  a 
neighboring  region  (and  if  so,  with  which  one). 

This  forced-choice  aspect  of  segmentation  is 
undesirable,  since  many  of  the  decisions  may  be 
wrong,  particularly  when  they  are  made  on  the  basis 
of  very  little  information,  and  it  is  difficult 
to  undo  the  effects  of  wrong  decisions. 

In  segmentation  by  pixel  classification,  a 
"relaxation"  approach  [3]  can  be  used  to  defer  the 
classification  decisions  until  more  information  is 
available.  In  this  approach  we  compute  a degree 
cf  membership  for  each  pixel  in  each  class,  or  a 
"probability"  that  it  belongs  to  each  class;  and 
we  then  iteratively  adjust  these  membership  values. 
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based  on  the  values  at  neighboring  pixels  and  the 
compatibilities  of  the  various  possible  combina- 
tions of  class  memberships  of  pairs  of  neighbors. 
After  a few  iterations,  the  membership  values 
stabilize,  with  some  values  becoming  or  remaining 
relatively  high  and  others  becoming  very  low,  so 
that  it  becomes  easy  to  make  tho  final  classifica- 
tion decisions. 

Segmentation  by  partitioning  into  homogeneous 
regions  - e.g.,  regions  of  approximately  constant 
value  - is  generally  more  powerful  than  segmenta- 
tion by  pixel  classification,  because  Lhe  informa- 
tion on  which  it  is  based  is  computed  over  regions 
rather  than  ("myopically")  over  small  neighborhoods 
of  pixels.  Thus  it  would  be  desirable  to  develop 
a region-based  segmentation  scheme  in  which  de- 
cisions are  not  made  immediately.  This  paper 
defines  such  a scheme  and  gives  examples  of  the 
results  obtained  when  it  is  applied  to  various 
types  of  images.  Section  2 describes  the  general 
principles  of  this  scheme  and  compares  it  with 
some  related  approaches;  Section  3 discusses  the 
algorithm;  and  Section  4 presents  experimental 
results. 

2 . Weighted  pyramid  linking 

Our  approach  to  unforced  image  partitioning 
makes  use  of  a "pyramid"  of  successively  reduced- 
resolution  versions  of  the  given  image,  say  of 
sizes  2n  by  2n,  2n_1  by  2n_l,...,  2x2.  The  base 
of  the  pyramid  (level  0)  is  the  input  image,  and 
each  successive  level  is  constructed  by  averag'ng 
4 by  4 blocks  of  pixels  on  the  level  below,  where 
the  blocks  overlap  50%  in  x and  in  y . (For  con- 
venience, each  level  is  regarded  as  cyclically 
closed,  so  that  its  top  row  is  adjacent  to  its 
bottom  row  and  its  left  column  to  its  right  column.) 
Thus  each  pixel  on  a given  level  has  16  "sons"  on 
the  level  below  (if  any)  that  contribute  to  its 
average,  and  4 "fathers"  on  the  level  above  (if 
any)  to  whose  average  it  contributes.  This  type  of 
pyramid  has  also  been  used  for  segmentation  pur- 
poses by  other  investigators;  e.g.,  see  the  work  of 
Hanson  and  Riseman  described  in  [4]. 

The  basic  idea  in  our  approach  is  to  define 
link  strengths  between  "neighboring"  pixels  (i.e., 
father /son  pairs)  on  adjacent  levels  of  the  pyramid, 
based  on  the  similarity  (in  value)  and  proximity 
(in  (x,y)  coordinates)  of  each  such  pair.  We 
then  recompute  the  pixel  values  (at  the  levels 
above  the  base)  as  weighted  averages  of  their  sons’ 
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values,  where  the  weights  depend  on  the  link 
strengths.  These  new  values  define  new  link 
strengths,  and  the  process  is  iterated.  (The  de- 
tails of  the  algorithm  will  be  given  in  the  next 
section.)  After  a few  iterations,  the  link 
strengths  stabilize,  and  the  links  that  remain 
strong  define  subtrees  of  the  pyramid.  As  it  turns 
out,  each  such  tree  defines  a compact  homogeneous 
region  in  the  image,  where  the  leaves  of  the  tree 
are  the  pixels  belonging  to  the  region,  and  the 
height  of  the  tree  corresponds  to  the  region  size 
(the  larger  the  region,  the  higher  the  root  of  its 
tree  lies  in  the  pyramid).  Thus  the  weighted  links 
can  be  used  to  define  a partition  of  the  image  into 
compact  homogeneous  regions.  Note  that  this  par- 
tition is  not  defined  immediately,  but  only  after 
the  link  weights  have  stabilized. 


pixel.  Extensions  of  this  scheme  to  segmentation 
based  on  color  or  texture,  and  to  waveform  or  con- 
tour segmemtation,  are  described  in  [7-10]. 

A pyramid  linking  method  which  does  make  use 
of  all  the  link  strengths,  rather  than  discarding 
all  but  the  strongest  upward  link,  is  described  in 
[11].  However,  in  this  method  the  link  strengths 
are  normalized  so  that  they  sum  to  1;  thus  here 
too  the  links  are  forced  to  extend  upward  from 
every  pixel  (divided  among  its  fathers  appropri- 
ately) all  the  way  to  the  top  level.  In  fact, 
the  link  strengths  tend  to  converge  to  0 or  1 
where  the  process  is  iterated,  so  that  this  method 
too  defines  a segmentation  of  the  image  into  four 
subpopulations  of  pixels,  rather  than  a partition 
into  regions. 


To  see  intuitively  why  this  approach  should 
work,  consider  the  case  of  a homogeneous  compact 
region  on  a homogeneous  background.  Pixels  in  the 
interior  of  the  region  (or  background)  will  link 
strongly  to  all  their  fathers,  since  these  fathers' 
values  are  averages  of  image  blocks  that  lie  in 
the  same  regionl  A pixel  near  the  region  border, 
however,  will  link  more  strongly  to  a father  that 
lies  inside  the  region  than  to  one  that  lies  partly 
outside,  since  it  is  more  similar  in  value  to  the 
former.  Thus  when  we  recompute  the  fathers' 
values,  a father  whose  image  block  lies  mostly  in- 
side the  region  will  get  closer  in  value  to  the 
average  of  the  region,  since  it  is  more  strongly 
linked  to  its  sons  that  lie  in  the  region  than  to 
those  that  lie  in  the  background;  and  conversely. 
This  makes  its  links  to  the  former  sons  even  strong- 
er, and  to  the  latter  even  weaker,  so  that  the  link 
strengths  and  values  should  converge.  Now  consider 
a pixel  whose  block  lies  mostly  inside  the  region, 
but  whose  fathers'  blocks  all  lie  mostly  outside, 
because  they  are  bigger  than  the  region.  By  the 
argument  just  given,  the  pixel's  value  should  tend 
toward  the  region  average,  while  its  fathers' 
values  should  tend  toward  the  background  average,  so 
that  the  pixel  does  not  remain  strongly  linked  to 
any  r its  fathers,  and  becomes  the  root  of  a tree 
representing  (a  compact  portion  of)  the  region. 

It  is  of  interest  to  compare  this  approach  to 
some  earlier  segmentation  schemes  based  on  pyramid 
linking  or  on  link  strengths.  In  [5-6]  link 
strengths  are  computed  between  each  father/son  pair, 
but  we  keep  only  the  strongest  of  the  four  links 
between  a pixel  and  its  fathers.  We  then  recompute 
the  pixel  values  allowing  only  those  sons  that  are 
linked  to  a pixel  to  contribute  to  its  value;  re- 
compute the  link  strengths  based  on  these  new 
values;  and  iterate  the  process.  Note  that  in  this 
scheme  every  pixel  must  link  to  one  of  its  fathers; 
thus  the  links  define  precisely  four  trees,  rooted 
at  the  top  (2x2)  level,  so  that  the  image  is  seg- 
mented into  precisely  four  sets  of  pixels.  These 
sets  do  not  correspond  to  compact  regions,  but  do 
tend  to  correspond  to  homogeneous  subpopulations  of 
pixels.  Thus  the  segmentation  scheme  of  [1-2]  is 
more  like  a pixel  clustering  and  classification 
scheme  than  an  image  partitioning  scheme;  and  it 
also  makes  fo  -ced  choices  immediately,  since  it 
keeps  only  the  strongest  upward  link  from  each 


A weighted  pixel  linking  scheme  not  involving 
a pyramid  is  described  in  [12]  . Here  a link 
strength  is  computed  for  each  pair  of  neighboring 
pixels  based  on  their  closeness  in  value.  The 
image  is  then  smoothed  by  replacing  each  pixel  with 
the  average  of  its  neighbors,  weighted  by  their 
link  strengths.  Using  these  new  values,  the  link 
strengths  are  recomputed,  and  the  process  is 
iterated.  This  tends  to  produce  a very  high- 
quality  smoothing,  and  the  links  that  remain 
strong  could  be  used  to  define  a segmentation  of 
the  image  into  homogeneous  regions;  but  this 
method  would  not  always  be  reliable,  since  it  is 
based  on  small  neighborhoods.  The  method  defined 
in  this  paper  is  analogous  to  the  scheme  in  [12]  , 
but  using  "vertical"  links  (between  fathers  and 
sons)  in  a pyramid,  rather  than  "horizontal"  links 
(between  brothers)  in  an  image  at  a single  resolu- 
tion. Our  method  could  be  generalized  to  make  use 
of  horizontal  as  well  as  vertical  link  strengths, 
but  we  shall  not  pursue  this  possibility  here. 

1.  The  algorithm 

The  algorithm  is  initialized,  as  mentioned 
earlier,  by  building  the  pyramid  using  unweighted 
averaging  of  4x4  blocks  that  overlap  50%  horizon- 
tally and  vertically.  Alternatively,  we  could  use 
nonoverlapping  2x2  blocks  (for  the  initialization 
only;  a pixel  still  has  16  sons  in  the  subsequent 
steps)  , or  we  could  use  the  median  instead  of  the 
mean;  but  these  variations  were  found  to  make 
little  difference  in  the  results. 

Let  v(P)  denote  the  value  of  pixel  P in  the 
pyramid,  say  on  level  l.  Initially,  if  1=0  this 
is  the  gray  level  of  an  input  pixel,  and  if 
£.>0  it  is  the  mean  of  the  values  of  P's  16  sons. 

Let  a( P)  be  the  standard  deviation  of  these  sons' 
values  (or  if  £=0,  we  take  o to  be  a constant;  we 
used  5 in  our  experiments). 

Let  P*  be  one  of  the  fathers  of  P.  The  link 
strength  between  P and  P*  is  defined  by 


w(P,P*)  = (l+d(P,P*)) 


, , rv(P)-v(P*)  ,2, 
exp  (-*i  [ g-('Py  — 1 ) 

/2tt  ct(P) 
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In  this  expression,  the  first  factor  depends  on  the 
distance  between  (the  centers  of)  P and  P*;  d is 
taken  to  be  3 for  the  closest  father,  1 for  the 
farthest,  and  V 5 for  the  other  two.  (It  can  be 
verified  that  these  are  proportional  to  the  Eucli- 
dean distances  between  the  centers.)  This  factor 
makes  the  sets  of  pixels  that  belong  to  a given 
tree  more  compact;  if  it  is  omitted,  these  sets 
become  more  irregular  in  shape.  The  factors  ^ 
reflect  the  (non)  variability  of  the  sons  of  °iP) 

P;  if  they  are  highly  variable,  P does  not  link 
strongly  to  any  of  its  fathers.  Finally,  the  exp 
factor  depends  on  the  similarly  in  value  of  P and 
P*;  if  they  are  very  dissimilar,  the  link  is  weak. 

We  now  want  to  recompute  the  pixel  values  at 
levels  >0  os  weighted  averages  of  their  sons' 
values,  where  the  weights  depend  on  the  link 
strengths.  Note  first  that  the  weight  given  to  a 
son  must  also  depend  on  the  (weighted)  "area"  of 
the  image  represented  by  that  son;  for  example, 
if  one  son  had  unit  strength  links  (down  through 
successive  levels)  to  a single  image  pixel,  and 
zero  strengths  to  all  its  other  descendants,  we 
would  not  want  to  give  it  as  much  weight  as  a son 
that  had  high-strength  links  to  many  image  pixels. 
Let  a(P)  be  the  "area"  of*  pixel  P;  initially,  for 
a pixel  at  level  £,  we  have  a(P)  =22*-,  since  P 
represents  a 2 ^ by  2 ^ image  block.  SuDseauently , 
let  a(P')  be  the  area  of  a son  P'  of  P,  and  let 
w(P',P)  be  the  link  strength  between  them.  Then 
a(P)=E  w(P' ,P)a(P’)/W(P’)  (where  the  sum  is  over 
the  Psons  of  P'  of  P) ; here  W=E  w(P',P*)  (the  sum 

P 1 * 

being  over  the  fathers  P'*  of  P>).  Note  that  in 
computing  a(P)  we  are  actually  using  normalized 
weights,  i.e.,  E w(P ' ,P’*) /W=l . This  is  because 
it  seems  reasonable  that  the  "area"  of  a pixel 
should  be  distributed  among  its  fathers  in  a 
normalized  fashion,  in  order  to  insure  that  the 
total  "area"  of  all  pixels  at  a given  level 
remains  equal  to  the  area  of  the  image. 


trees  by  using  only  their  most  strongly  linked 
f athers, 

4 . Exper iment  s 

The  algorithm  just  described  was  applied  to  the 
three  images  shown  in  Figure  1:  photomicrographs 

of  some  chromosomes  (right)  and  blood  cells  (left), 
and  an  infrared  image  of  a tank.  Each  image  is 
64x64  pixels;  thus  the  top  (2x2)  level  of  the 
pyramid  is  level  5.  it  each  iteration,  the  gray 
level  displayed  for  < ach  pixel  is  the  value  at  the 
root  of  its  tree.  W>  see  that  even  after  a single 
iteration,  the  trees  define  a decomposition  of  the 
image  into  regions  having  a small  set  of  values; 
and  in  one  or  two  more  iterations  the  set  of  values 
is  reduced  even  further. 

Table  1 lists  the  root  nodes  at  each  level, 
and  their  values,  for  each  image  for  as  many  iter- 
ations as  were  needed  until  there  was  no  further 
change  in  the  set  of  roots.  We  see  that  the  more 
complex  the  image,  the  more  iterations  are  required 
for  the  set  of  roots  to  stabilize;  but  that  even 
for  the  most  complex  image,  the  changes  after  the 
first  two  or  three  iterations  have  little  effect 
on  the  segmentation  of  the  image. 

Figure  2 shows  printouts  of  the  displayed 
images  after  the  first  (parts  a-c)  and  last 
(parts  d-f)  iterations,  where  the  value  printed  in 
each  region  identifies  the  root  of  the  tree  to 
which  it  belongs;  the  digit  is  the  level,  and  the 
letters  are  used  to  distinguish  the  roots  at  that 
level.  We  see  that  after  a few  itera.ions,  the 
leaves  of  each  tree  define  a small  set  of  compact 
regions.  As  Table  1 indicates,  regions  that  are 
compact  pieces  of  a single  homogeneous  region  have 
nearly  the  same  value.  Note  that  because  of  the 
coordinate  wraparound,  reg.'ms  on  opposite  sides 
of  the  image  may  belong  to  the  same  tree. 


Finally,  the  new  value  of  pixel  P is  given  in 
terms  of  its  sons'  values  by 


v(P) 


E v(P’)a(P')w(P’,P) 
P* 


E a(P')w(P',P) 
P' 


where  the  sums  are  over  the  sons  P’  of  P.  Sim- 
ilarly, the  new  standard  deviation  is  given  by 


E(v(P)  -v(P’))2a(P')w(P',P) 


O(P) 


E a(P')w(P',P) 
P' 


The  process  is  iterated;  in  our  experiments,  only 
two  or  three  iterations  were  necessary. 


After  the  desired  number  of  iterations,  we 
call  a pixel  a "root"  if  it  is  on  the  top  level 
(2x2),  or  if  the  sum  of  its  link  strengths  to  all 
its  fathers  is  negligible  (in  our  experiments: 
<Kr5l.  The  nonroot  pixels  are  then  assigned  to 


5,  Concluding  remarks 

We  have  exhibited  a method  of  segmenting  an 
image  into  compact  honogeneous  regions  by  con- 
structing links  between  "neighboring"  pixels  at 
consecutive  levels  of  a "pyramid1'. 

An  important  feature  of  this  method  is  that 
each  region  is  represented  by  a tree  having  the 
pixels  of  the  region  as  leaves.  The  height  of  this 
tree  is  proportional  to  the  log  of  the  legion  size. 
Thus,  even  for  large  regions,  all  the  pixels  in  the 
region  are  relatively  closely  linked  to  the  root  of 
the  tree,  and  hence  to  each  other.  The  pyramid 
structure  makes  it  possible  for  information  to 
propagate  between  different  parts  of  a region 
relatively  rapidly.  Moreover,  the  root  of  the  tree 
can  be  used  as  a node  to  represent  the  region  in 
various  region-level  relational  structures.  Thus 
the  tree  constitutes  a transition  between  the 
pixel-level  representation  of  the  region  and  more 
abstract  representations. 

Another  important  feature  of  our  method  is 
that  the  trees  are  produced  by  a cooperative  pro- 
cess in  which  link  strengths  are  iteratively 
adjusted.  Under  this  process,  root  pixels 
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representing  regions  become  easy  to  recognize,  be- 
cause their  link  strengths  to  their  fathers  all 
become  negligible.  They  are  harder  to  recognize 
in  the  original  pyramid,  where  the  pixels  (espe- 
cially at  higher  levels)  represent  mixtures  of 
image  pixels,  so  that  the  link  strengths  are  not 
initially  negligible. 

Image  processing  and  segmentation  techniques 
based  on  "local"  operations  performed  in  a pyramid 
can  be  implemented  very  rapidly  in  parallel  on  a 
tree-structured  cellular  processor  [13].  It  is 
possible  that  processes  of  this  type  also  play  a 
role  in  biological  visual  systems,  whore  the  input 
image  is  represented  at  a range  of  resolutions 

[141. 
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Results  after  one  iteration 
Results  after  last  iteration 
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labels  (see 


Table  1. 


Root  nodes  and 
Figure  2)  are 


their  values  at  each  iteration  for  the  three  images, 
given  only  for  the  first  and  last  iterations. 


The  root 


(a)  Cell  image;  there  were  no  changes  in  the  set 
of  root  nodes  after  the  third  iteration. 


(b)  Tank  image;  no  changes  after 
Note  that  one  of  the  roots  is 


the  second  iteration, 
a single  pixel. 
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